Affiliations

1Mount Allison University, New Brunswick, Canada

Acknowledgements

1Maximilian Berthold and Douglas A. Campbell

1 Overview

File import functions read files into R objects, commonly data frames.

2 Introduction

2.1 We isolated yeast from apple tree bark, oak leaves, and brewer yeast from our cultures where the apple, oak bark and the negative control samples did not contain yeast.

2.2 Viability (%) of the isolated yeast strains.

2.3 The specific gravity from the brew of each yeast strain isolated.

2.4 Alcohol tolerance of the yeast strains.

3 Materials & Methods

4 Figure 1: Photos of plated samples, taking during isolation of colonies.

4.1 1A: AppleTreeBark Yeast Sample Grown at 37 degree Celsius in YPD Broth (SampleID 7)

4.2 1B: Brewers Yeast Sample Grown at 37 degree Celsius in YPD Broth (SampleID 19)

5 1C: OakLeaves Yeast Sample Grown at 37 degree Celsius in YPD Broth (SampleID 36)

5.1 1D: Brewers Yeast Sample Grown at 20 degree Celsius in YM Broth (SampleID 41)

5.2 Set chunk options

Formatted display of content from .md file on GitHub site. Upon knitr figures will be saved to ‘Figs/’

5.3 Load packages

5.4 InLine Citations of software packages added through the ‘citr’ option under ‘Addins’.

The cited items must be in the .bib file saved in the .Rproj folder; in this case RPackageCitations.bib, generated by exporting a library to .bib from Zotero. Upon export click the ‘Keep Updated’ button in the BetterBibTex menu. If new citations are added to Zotero they will be pushed through to update the exported .bib file in the .Rproj folder. [@bryanGooglesheets4AccessGoogle2021;@mcgowanGoogledriveInterfaceGoogle2020; @wickhamTidyverseEasilyInstall2017]

5.5 Import MetaData from a GoogleSheet

MetaData in a GoogleSheet is more generic than a project-specific .csv edited and saved locally. The GoogleSheet interface can be tricky. Repeatedly reading from GoogleDrive can provoke a throttle from Google.

# gs4_auth()

#Instead of sending a token, googlesheets4 will send an API key. This can be used to access public resources for which no Google sign-in is required. 
googlesheets4::gs4_deauth()

# Define the Google Sheet URL (replace with your actual URL)
sheet_url <- "https://docs.google.com/spreadsheets/d/17dDzASxhWDbVpQFXb201vT2oB0rkyi2h4gG33ArOaDA/edit?usp=sharing"

# Read the Google Sheet into R as a data frame
ATMetaData <- read_sheet(sheet_url)

# View the imported data (optional)
View(ATMetaData)
# Authenticate with Google Sheets (you only need to do this once)
# gs4_auth()

#Instead of sending a token, googlesheets4 will send an API key. This can be used to access public resources for which no Google sign-in is required. 
googlesheets4::gs4_deauth()

# Define the Google Sheet URL (replace with your actual URL)
sheet_url <- "https://docs.google.com/spreadsheets/d/1HDlxd_bt9I49CQGU18b3h6Df69DG3vQAwbIMZTEdpBQ/edit?usp=sharing"

# Read the Google Sheet into R as a data frame
MetaData <- read_sheet(sheet_url)
#View(MetaData)
# Authenticate with Google Sheets (you only need to do this once)
# gs4_auth()

#Instead of sending a token, googlesheets4 will send an API key. This can be used to access public resources for which no Google sign-in is required. 
googlesheets4::gs4_deauth()

# Define the Google Sheet URL (replace with your actual URL)
sheet_url <- "https://docs.google.com/spreadsheets/d/15Qd3DYkeRX4FCy84Uw9ns9L6Bq6P959csP-opdGvBME/edit?usp=sharing"

# Read the Google Sheet into R as a data frame
SpecificGravityData <- read_sheet(sheet_url)
# Authenticate with Google Sheets (you only need to do this once)
# gs4_auth()

#Instead of sending a token, googlesheets4 will send an API key. This can be used to access public resources for which no Google sign-in is required. 
googlesheets4::gs4_deauth()

# Define the Google Sheet URL (replace with your actual URL)
sheet_url <- "https://docs.google.com/spreadsheets/d/118YF2qZqtC5yCfqyDeRaGQhgm5v8mn6wC_6ae1Rnyik/edit?usp=sharing"

# Read the Google Sheet into R as a data frame
BrothPlateData <- read_sheet(sheet_url)
#View(BrothPlateData)
# Authenticate with Google Sheets (you only need to do this once)
# gs4_auth()

#Instead of sending a token, googlesheets4 will send an API key. This can be used to access public resources for which no Google sign-in is required. 
googlesheets4::gs4_deauth()

# Define the Google Sheet URL (replace with your actual URL)
sheet_url <- "https://docs.google.com/spreadsheets/d/1YN_s-YRdM4j9t_c_l1Z4-Q53KQlURTlDpItovzZNd3g/edit?usp=sharing"

# Read the Google Sheet into R as a data frame
DataDictionnary <- read_sheet(sheet_url)

6 Merge BrothPlateData with MetaData, via leftjoin().

MergedBrothPlateData <- BrothPlateData %>%
  left_join(MetaData, by = "SampleID")
#head(MergedBrothPlateData)

7 Merge SpecificGravityData with MetaData, via leftjoin().

MergedSpecificGravityData <- SpecificGravityData %>%
  left_join(MetaData, by = "SampleID")
#head(MergedSpecificGravityData)
#View(MergedSpecificGravityData)

8 Settings for file import

9 Note: The .tsv file exported from the Molecular Dynamics software with encoding UTF16LE had embedded null characters that caused problems upon import.

As a hacked solution, download the .tsv from ‘Teams’. Import the .tsv into a new GoogleSheet. Delete the first two rows of the file. Export from GoogleSheet as .csv. Move the .csv to the .Rproj folder. A better way would be code to read the problematic UTF16LE file with null characters. The issue relates to a complex file structure exported from Molecular Dynamics software.

Project <- "WildYeast"

#set variables for file import & processing
DataPath <- file.path("..", "Data", "RawData", fsep = .Platform$file.sep)
file_id <- ".csv"

# DataOneDrive <- "~/OneDrive - Mount Allison University/BIOL2201_2024/StudentDataTest"

10 List Data file(s)

DataFiles <- list.files(path = DataPath, pattern = file_id, full.names = TRUE)

DataFiles
## [1] "../Data/RawData/24102024_BIOL3111_YeastGroup.csv"
FileEncodeMD <- as.character(guess_encoding(file = DataFiles[1])[1,1])
FileEncodeMD
## [1] "UTF-8"

11 Read ODData file(s) into form readable by R

ODData <- readr::read_csv(file =  DataFiles[1])

#head(ODData)


#data.table::fread(text = readLines(DataFiles[1], skipNul = T))
#code for reading multiple files, if needed
# read_delim_plus <- function(flnm, delimiter, headerrows, fileencode){read_delim(flnm, delim = delimiter,  col_names = TRUE,  skip = headerrows, escape_double = FALSE,  locale = locale(encoding = fileencode), trim_ws = TRUE) |>
#     mutate(Filename = flnm)
#   }
# 
# ODData <- DataFiles |>
#   map_df(~read_delim_plus(flnm = ., delimiter = DelimiterMD,  headerrows = HeaderRowsMD,  fileencode = FileEncodeMD))
# 
# head(ODData)

12 Molecular Dynamics exports problematic variable names; fix them.

colnames(ODData)
##  [1] "Time"            "Temperature(¡C)" "A1"              "A2"             
##  [5] "A3"              "A4"              "A5"              "A6"             
##  [9] "A7"              "A8"              "A9"              "A10"            
## [13] "A11"             "A12"             "B1"              "B2"             
## [17] "B3"              "B4"              "B5"              "B6"             
## [21] "B7"              "B8"              "B9"              "B10"            
## [25] "B11"             "B12"             "C1"              "C2"             
## [29] "C3"              "C4"              "C5"              "C6"             
## [33] "C7"              "C8"              "C9"              "C10"            
## [37] "C11"             "C12"             "D1"              "D2"             
## [41] "D3"              "D4"              "D5"              "D6"             
## [45] "D7"              "D8"              "D9"              "D10"            
## [49] "D11"             "D12"             "E1"              "E2"             
## [53] "E3"              "E4"              "E5"              "E6"             
## [57] "E7"              "E8"              "E9"              "E10"            
## [61] "E11"             "E12"             "F1"              "F2"             
## [65] "F3"              "F4"              "F5"              "F6"             
## [69] "F7"              "F8"              "F9"              "F10"            
## [73] "F11"             "F12"             "G1"              "G2"             
## [77] "G3"              "G4"              "G5"              "G6"             
## [81] "G7"              "G8"              "G9"              "G10"            
## [85] "G11"             "G12"             "H1"              "H2"             
## [89] "H3"              "H4"              "H5"              "H6"             
## [93] "H7"              "H8"              "H9"              "H10"            
## [97] "H11"             "H12"
# Rename column using pattern matching
ODData <- ODData %>%
  rename_with(~ gsub("Temperature.*", "TempC", .x))  # Replace any name starting with "Temperature"
#head(ODData)

13 Lost labels for OD680 & OD720 in hacked import; regenerate them.

#remove blank rows by filtering out rows where Temp_C is NA
ODData <- ODData |>
  filter(!is.na(TempC))

#this is a hack solution, knowing there are only 15 rows of data
#better way would be to keep the OD settings during import
ODData <- ODData |>
  mutate(nm = ifelse(row_number() %in% c(1:15), 680, 720), .before = Time)

#head(ODData)

14 For plotting it is easier to have data in ‘long’ format, where Well becomes a Value of a Variable, rather than a separate variable for each well.

ODDataLong <- pivot_longer(ODData, cols = -c(nm, Time, TempC), names_to = "Well", values_to = "OD")

ODDataLong |>
  ggplot() +
  geom_point(aes(x = Time, y = OD)) +
  facet_grid(cols = vars(nm))

#head(ODDataLong)

15 Data in long format (rds).

saveRDS(ODDataLong, file = file.path("..", "Data", "ProcessedData", "ODDataLong.rds"))

16 Convert time from h:m:s format to numeric format (1,2,3,4,5,6,7,8 etc.)

ODDataLong <- ODDataLong %>% 
  mutate(Time_numeric = as.numeric(hms(Time)) / 3600)  # Converts to hours

17 Merge ODDataLong with Metadata before attempting data analyses.

# Merge ODData with ATMetaData
Merged_ATData <- ODDataLong %>%
  left_join(ATMetaData, by = "Well")
#View(Merged_ATData)

18 Filter out wells that had no media/innoculant to avoid problems in analysis.

# Exclude wells G10, G11, G12, H10, H11, H12
Merged_ATData <- Merged_ATData %>%
  filter(!Well %in% c("G10", "G11", "G12", "H10", "H11", "H12"))

# Verify
# table(Merged_ATData$Well) 

19 Filter merged data by wavelength, inorder to make growth curves for OD720 and OD680.

# Filter data for OD 680 nm
Merged_ATData_680 <- Merged_ATData %>%
  filter(nm == 680)

# Filter data for OD 720 nm
Merged_ATData_720 <- Merged_ATData %>%
  filter(nm == 720)

20 Double check data is in time numeric form for analyses.

Merged_ATData <- Merged_ATData %>% 
  mutate(Time_hour = as.numeric(hms(Time)) / 3600)  # Converts to hours

21 Verify structure prior to spline test for MuMax, problematic structure, will cause problems.

# Verify dataset structure
# str(Merged_ATData)  
# head(Merged_ATData) 

22 Figure 2: The change in absorbance at 720 nm over 15 hours of each stain when exposed to varying ethanol concentrations of 2 – 12 % (v/v).

Merged_ATData |>
  filter(nm == 720)|>
ggplot(aes(x = Time_hour, y = OD, color = as.factor(SampleID), group = Well)) +
  geom_line() +
  geom_point() +
  facet_grid(rows = vars(EthanolConcentration)) + 
#  labs(
 #   title = "Growth Curves for Each Sample (OD at 720 nm)",
  #  x = "Time (hours)", 
   # y = "Optical Density (OD720)",
    #color = "Ethanol (%)"
  #) +
  theme_minimal() +
  theme(
    text = element_text(size = 12),
    legend.position = "bottom"  
  )

23 Figure 3: The change in absorbance at 680 nm over 15 hours of each stain when exposed to varying ethanol concentrations of 2 – 12 % (v/v).

Merged_ATData |>
  filter(nm == 680)|>
ggplot(aes(x = Time_numeric, y = OD, color = as.factor(SampleID), group = Well)) +
  geom_line() +
  geom_point() +
  facet_grid(rows = vars(EthanolConcentration)) + 
#  labs(
 #   title = "Growth Curves for Each Sample (OD at 720 nm)",
  #  x = "Time (hours)", 
   # y = "Optical Density (OD720)",
    #color = "Ethanol (%)"
  #) +
  theme_minimal() +
  theme(
    text = element_text(size = 12),
    legend.position = "bottom"  
  )

24 Spline Fitting for Determining MuMax and R-squared for all Wells.

# Step 2: Spline Fitting and Parameter Extraction
SplineRates_nest <- Merged_ATData |> 
  group_by(nm, Well, Source, SampleID, EthanolConcentration, BrothType, GrowthTempC) |> 
  nest() |> 
  mutate(SplineFit = purrr::map(data, ~tryCatch(
      growthrates::fit_spline(.$Time_hour, .$OD),  # Fit spline
      error = function(e) NULL  # Handle errors gracefully
    )), # Fit spline with error handling
Mumax_hour = purrr::map_dbl(SplineFit, ~if (!is.null(.)) pluck(., "par", "mumax") else NA_real_),  
    rsquared = purrr::map_dbl(SplineFit, ~if (!is.null(.)) pluck(., "rsquared") else NA_real_))|> 
  ungroup()

SplineRatesResults <- SplineRates_nest |> 
  select(nm, Well, Mumax_hour, rsquared, Source, SampleID, EthanolConcentration, BrothType, GrowthTempC)

# View the results
# print(SplineRatesResults)

25 Cleaned the resulting merged data, “Merged_ATMetaData_MuMax”, before plotting MuMax vs EtOH concentration.

# Remove rows with any NA values
SplineRatesResults <- na.omit(SplineRatesResults)

# Check the result
# head(SplineRatesResults)

#View(SplineRatesResults)

26 Fit Linear Models

#define exponential decay function for data fitting.
#exp_decay <- function(x, i, mu){y = i * exp(mu * x)}

#need to merge left_join(Merged_ATMetaData_MuMax_clean, true MetaData so you can talk about growth responses to EtOH in a sensible manner
 # left_join(Merged_ATMetaData_MuMax_clean, 

MuEtOH_nest <- SplineRatesResults |>
  nest(.by = c("SampleID", "nm", "Source", "BrothType", "GrowthTempC")) |> #want to nest by source
  mutate(LinearFit = purrr::map(data, ~lm(Mumax_hour ~ EthanolConcentration,
                                            data = .x)),
         LinearTidy = purrr::map(LinearFit, tidy),
         LinearParam = purrr::map(LinearFit, glance),
         LinearPredict = purrr::map(LinearFit, augment))

27 Figure 4: The relationship between maximum growth rate (MuMax (hours)) ethanol concentration (% v/v) of the isolated yeast strains, faceted by source and culturing conditions.

#Set label for GrowthTempC
temp_labeller <- function(value) {
  paste(value, "°C")
}
nm_labeller <- function(value) {
  paste(value, "nm")
}
#Plot Linear model
MuEtOH_nest |>
  unnest(cols = c(LinearPredict)) |>
  ggplot() +
  geom_point(aes(x = EthanolConcentration, y = Mumax_hour)) +
  geom_line(aes(x = EthanolConcentration , y = .fitted)) +
  #geom_point(aes(x = Ethanol (v/v %), y = .resid), colour = "red") +
  facet_grid(cols = vars(Source, BrothType, GrowthTempC), rows = vars(nm), labeller = labeller(GrowthTempC = temp_labeller , nm = nm_labeller)) +
   labs(x = "Ethanol Concentration % (v/v)") +
  theme_bw()

28 Show fit parameters

MuEtOH_nest |>
unnest(cols = c(LinearTidy)) |>
 select(-c(data, LinearFit, LinearParam, LinearPredict)) |>
  select(-c(statistic)) |>
  pivot_wider(id_cols = c(SampleID, nm), names_from = term, values_from = c(estimate, std.error, p.value)) |>
  kable()
SampleID nm estimate_(Intercept) estimate_EthanolConcentration std.error_(Intercept) std.error_EthanolConcentration p.value_(Intercept) p.value_EthanolConcentration
19 680 0.2030007 -0.0078177 0.0064375 0.0008022 0.0000000 0.0000002
36 680 0.0011525 0.0001520 0.0035337 0.0004403 0.7495071 0.7355055
18 680 0.2551484 -0.0065962 0.0344146 0.0042884 0.0000051 0.1479938
43 680 0.4995635 -0.0277691 0.0457810 0.0057048 0.0000001 0.0003073
41 680 0.4418388 -0.0212349 0.0541076 0.0067424 0.0000018 0.0076797
7 680 -0.0088181 0.0025696 0.0062424 0.0007779 0.1812614 0.0057114
19 720 0.2376157 -0.0089990 0.0060420 0.0007529 0.0000000 0.0000000
36 720 -0.0008505 0.0006321 0.0052433 0.0006534 0.8736381 0.3509783
18 720 0.2174016 -0.0062407 0.0150228 0.0018720 0.0000000 0.0053870
43 720 0.3732816 -0.0170660 0.0434841 0.0054186 0.0000010 0.0076787
41 720 0.3378377 -0.0151847 0.0441729 0.0055044 0.0000036 0.0162675
7 720 -0.0070542 0.0021570 0.0048503 0.0006044 0.1695555 0.0034301

29 Figure 5: The percent of living yeast compared to total yeast counts and the culturing conditions of the isolated yeast.

29.1 YM and 37 °C are the optimum culturing conditions for obtaining the highest percentage of living cells per total cell.

MergedBrothPlateData |>  
filter(!is.na(SampleID)) |>  
filter(!Source %in% c("Air", "Apple", "OakTreeBark")) |>  
ggplot(aes(x = as.factor(Source), y = ViabilityPercent, fill = BrothType, color = BrothType, shape = as.factor(GrowthTempC))) +
geom_point(size = 4) + 
coord_cartesian(ylim = c(0, 100)) +
geom_text(aes(label = round(ViabilityPercent, 1)), vjust = -0.5, hjust = 1.5, size = 3.25, color = "black") +
scale_x_discrete(drop = TRUE) +  
labs(title = "Viability Percent by Source, Broth Type and Temperature", x = "Source",
y = "Viability Percent (%)",
color = "Broth Type",
fill = "Broth Type",
shape = "Growth Temperature (°C)"  # update legend title for GrowthTempC
) +
theme_minimal(base_size = 12) +
theme(strip.text = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5),
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
) + scale_shape_discrete(labels = function(x) paste(x, "°C"))  # attach °C to legend items

30 Final alcohol percent of each sample tested

# Find initial Specific Gravity (SG) at Time = 0
#initial_sg_values <- MergedSpecificGravityData %>%
#filter(Time == 0) 
#%>%
#select(SampleID, SpecificGravity) %>%
#rename(InitialSG = SpecificGravity)

# Find final SG for each SampleID
#final_sg_values <- MergedSpecificGravityData %>%
#group_by(SampleID) %>%
#summarize(FinalSG = min(SpecificGravity, na.rm = TRUE), .groups = "drop")

# merge initial SG and final SG into a single dataset
#combined_sg_values <- merge(initial_sg_values, final_sg_values, by = "SampleID")

# calculate ABV 
#combined_sg_values <- combined_sg_values %>%
#mutate(ABV = (InitialSG - FinalSG) * 131.25)
#print(combined_sg_values)
#View(combined_sg_values)

31 Figure 6: the change in specific gravity during the 7-day brewing process of each yeast strain and the culturing conditions of the isolated yeast.

31.1 Lag phase of 4 days for the yeast from apple tree bark, oak leaves 3 days and 1 day for the brewer yeast.

31.2 Brewer yeast is the only stain which appears to have reached the end point of the brew.

31.3 Culturing conditions appear to have little to no effect on fermenting capabilities.

ggplot(MergedSpecificGravityData, 
aes(x = Time, y = SpecificGravity, color = BrothType, shape = as.factor(GrowthTempC))) +
geom_line(size = 1) + 
geom_point(size = 2) + 
coord_cartesian(ylim = c(0, 1.5)) +
facet_grid(cols = vars(Source)) +  # facet by Source
labs(x = "Time (hours)", 
y = "Specific Gravity", 
color = "Broth Type", 
shape = "Growth Temperature (°C)",  # legend title for GrowthTempC
title = "Specific Gravity Over Time by Source") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14), 
strip.text = element_text(size = 10, face = "bold"), 
axis.text.x = element_text(angle = 0, hjust = 1)) +
scale_shape_discrete(labels = function(x) paste(x, "°C"))  # attach °C to legend items

32 The following code is for the rate of fermentation versus time for each sample type– this shows if the rate of alcohol production changes as the brewing time increases

# Calculate rate of change in Specific Gravity
# MergedSpecificGravityData <- MergedSpecificGravityData %>%
# group_by(SampleID) %>%
# mutate(SGRate = c(NA, diff(SpecificGravity) / diff(Time)))
# Filter missing values to allow connected lines
# MergedSpecificGravityData <- MergedSpecificGravityData %>%
# filter(!is.na(SGRate) & !is.na(Time))

# Plot with facets by Source and colors by SampleID
# ggplot(MergedSpecificGravityData, aes(x = Time, y = SGRate, color = as.factor(SampleID), group = SampleID)) +
# geom_line(size = 1, na.rm = TRUE) + 
# geom_point(size = 2) +  
# facet_grid(cols = vars(Source)) +  # Facet by Source
# labs(x = "Time",
# y = "Specific Gravity Rate of Change",
# color = "Sample ID",  
# title = "Rate of Change of Specific Gravity Over Time by Source") +
# theme_minimal(base_size = 12) +
# theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
# strip.text = element_text(size = 10, face = "bold"),
# axis.text.x = element_text(angle = 0, hjust = 1))

33 Making a graph to show how long each brew took to reach the final SG

# Calculate completion time for each SampleID
# completion_times <- MergedSpecificGravityData %>%
# group_by(SampleID, Source) %>%  # Include Source for grouping
# filter(SpecificGravity == min(SpecificGravity, na.rm = TRUE)) %>%
# summarize(Completion_Time = min(Time), .groups = "drop")

# Plot of Completion Times without facets
# ggplot(completion_times, aes(x = factor(SampleID), y = Completion_Time, color = Source)) +
# geom_point(size = 4) +
# geom_text(aes(label = round(Completion_Time, 1)), vjust = -0.75, hjust = 0.75, size = 5, color = "black") +
# labs(title = "Completion Time by Source and Sample ID",x = "Sample ID", y = "Completion Time (hours)", color = "Source") +
# theme_minimal(base_size = 12) +
# theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14), strip.text = element_text(size = 10, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1),
# legend.position = "right")

34 Figure 7: The percent of sugar consumed referred to the apparent attenuation by the yeast strains from the initial malt, facted by source and culturing conditions.Finding out the atenuation of the sample this will identify the efficency of the yeast strain by comparing the sugar content in the initial wart to the amount of sugars left over giving a percent of sugars consumed and converted to alcohol or CO2.

34.1 Culturing conditions have little to no effect on the amount of sugars consumed.

34.2 Brewer yeast consumed significantly more sugar due to the prolonged lag phase of apple tree bark and oak leaf yeast, and the brew had not reached completion.

# Calculate Apparent Attenuation for Each Sample
AttenuationData <- MergedSpecificGravityData %>%
group_by(Source , BrothType , GrowthTempC) %>%  # Include Source for grouping
summarize(InitialSG = max(SpecificGravity, na.rm = TRUE),  # Initial SG is the max
FinalSG = min(SpecificGravity, na.rm = TRUE),    # Final SG is the min
.groups = "drop") %>%
mutate(ApparentAttenuation = ((InitialSG - FinalSG) / (InitialSG - 1)) * 100)

# Plot Apparent Attenuation Data by Source
ggplot(AttenuationData, aes(x = Source, y = ApparentAttenuation, color = BrothType, shape = as.factor(GrowthTempC))) +
geom_point(size = 4) +
coord_cartesian(ylim = c(0, 100)) +
geom_text(aes(label = round(ApparentAttenuation, 1)), vjust = -0.5, hjust = 1.5, size = 3.25, color = "black") +
labs(title = "Apparent Attenuation (%) by Source, Broth Type and Temperature", x = "Source", y = "Apparent Attenuation (%)", color = "Broth Type",
shape = "Growth Temperature (°C)"  # update legend title for GrowthTempC
) +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14), strip.text = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5),
legend.position = "right") + 
scale_shape_discrete(labels = function(x) paste(x, "°C"))  # attach °C to legend items

35 Bibliography